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Abstract 

> 

In any Markov chain Monte Carlo analysis, rapid convergence of the chain to its target probability 
distribution is of practical and theoretical importance. A chain that converges at a geometric rate is 

> 

geometrically ergodic. In this paper, we explore geometric ergodicity for two-component Gibbs sam- 

00 

piers which, under a chosen scanning strategy, evolve by combining one-at-a-time updates of the two 
components. We compare convergence behaviors between and within three such strategies: composi- 
tion, random sequence scan, and random scan. Our main results are twofold. First, we establish that 
if the Gibbs sampler is geometrically ergodic under any one of these strategics, so too are the others. 
Further, we establish a simple and verifiable set of sufficient conditions for the geometric ergodicity of 
the Gibbs samplers. Our results are illustrated using two examples. 
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1 Introduction 



Providing a framework for approximately sampling from complicated target probability distributions, 
Markov chain Monte Carlo (MCMC) methods facilitate statistical inference in intractable settings. Con- 
sider distribution w with support on some general state space in Implementation of the foundational 
Metropolis-Hastings MCMC algorithm for w requires full-dimensional draws from an approximating pro- 
posal distribution. However, in settings requiring MCMC, w is typically complicated or d large. Thus, 
constructing an appropriate proposal can be prohibitively difficult. In such cases, we might instead employ 
a component-wise strategy which updates w one variable or sub-block of variables at a time. 

The fundamental component- wise MCMC algorithm, the Gibbs sampler (GS), evolves by updating 
each sub-block or component with draws from its conditional distribution given the other components. 
For example, suppose we block the variables of w into two components, X E M. dx and Y E M. dy where 
d x ,d y > 1 and d x + d y = d. Let ir(x,y) denote the corresponding two-component density admitted 
by w with respect to measure /i = [i x x [i y and having support X x Y C M. dx x Mr*. Then the GS 
Markov chain $ := {(I* *,^ )), (X^,Y^), (X (2 \Y^), . . .} evolves by drawing updates of X and Y 
from the conditional densities ir(x\y) := ir(x,y)/ f ir(x,y)fi x (dx) and ir(y\x) := ir(x,y)/ f 7r(x,y)fi y (dy), 
respectively. Let P n ((x, y), A) denote the corresponding ra-step Markov transition kernel where for state 
(x, y) E X x Y, set A in the Borel cr-algebra B on X x Y, and n, i E Z + 

P n ({x,y),A) = p r ^X^ +n \Y^ +n ^ E A | y«) = (x,y)) . 

When the GS is Harris ergodic (ie. tt irreducible, aperiodic, and Harris recurrent with invariant density tt 
( |Meyn and Tweedie[p93l », $ converges to zu in total variation distance. That is, || P n ((x, y), •)— vu{-) ||:= 



sup^ g g \ P n ((x, y),A) — -07(^4)1 — > as n — > oo. Understanding the rate of this convergence is paramount 
in evaluating the quality of Markov chain output. To this end, we say $ is geometrically ergodic if there 
exist some function M:XxY->R and constant t E (0, 1) for which 

|| P n ((x,y),-) - zu(-) \\< t n M(x,y) for all (x, y) E X x Y . (1) 

A geometric convergence rate is crucial for several reasons, not least of which is achieving effective simu- 
lation results in finite time. Perhaps most importantly, geometric ergodicity ensures that the same tools 
used for evaluating estimators in the independent and identically distributed sampling setting also exist 
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in the GS setting. Specifically, suppose we wish to calculate E m (g) := JJ g(x,y)ir(x,y)fj, x {dx)iJ,y(dy) for 
g : X x Y — > R. Under Harris ergodicity, the Monte Carlo estimate g n := ^"Jq 1 9 {X^\Y^\ converges 
to E- W {g) with probability one as n — > oo. Further, if E m \g\ 2+S < oo for some 5 > 0, geometric ergodicity 
ensures the existence of a Markov chain Central Limit Theorem (CLT) 



Vn(g n - E m (g)) 4> N (0, a 2 ) asn->oo 



(2) 



for < Og < oo (Jones, 2004). Under these same conditions, batch means, spectral methods and 
regenerative simulation methods provide asymptotically valid Monte Carlo standard errors for g n , a g /y/n 



(Atchade 2011 Flegal and Jones, 2010; Hobert, Jones, Presnell, and Rosenthal, 2002; Jones, Haran. 



Caffo, and Neath, 2006). In turn, we can rigorously assess the accuracy of g n and determine a sufficient 



simulation length n (Flegal and Jones, 2010 Flegal, Haran, and Jones, 2008). 

Accordingly, our goal is to explore geometric ergodicity for the two-component GS. Studying this 
special case is a crucial first step in understanding convergence for GS with multiple components and 
has many practical applications. For instance, two-component GS serves as the foundation of data 
augmentation methods and can be used to explore such practically relevant models as the Bayesian 



general linear model in Johnson and Jones (2010). Our work in this GS setting is twofold. First, we explore 
convergence behavior under three different GS scanning strategies: composition, random sequence scan, 
and random scan. For one, we establish that if the GS under any one of these strategies is geometrically 



ergodic, they all are. These results fill in gaps left by Johnson, Jones, and Neath (2013) who explore 
convergence of component-wise samplers in the general setting. Second, we provide a simple set of 
sufficient conditions for the geometric ergodicity of the GS. Such conditions exist for selected model- 
specific settings (see, for example, Diaconis, Khare, and Saloff-Coste ( 2008a|b ); Hobert and Geyer (1998); 



Johnson and Jones| (2010); Jones and Hobert (2004); Roberts and Rosenthal (1998)). However, there 



is a lack of verifiable conditions that can be utilized in general settings. For example, though Geman 



and Geman (1984) and Liu, Wong, and Kong (1995) provide general, sufficient conditions for geometric 



ergodicity, Geman and Geman only consider GS on finite state spaces and the conditions in Liu et al. are 



admittedly difficult to establish in practice. Further, in their Proposition 1, Tan, Jones, and Hobert 



(2011) note the need for a drift condition, but stop short of providing guidance on how to construct such 
a condition. 



We begin in Sections 2.1 and 2.2 with an overview of GS and geometric ergodicity, respectively. 
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In Section 3.1 we explore geometric convergence of the GS under different scanning strategies and, in 
Section 3.2, present sufficient conditions for geometric ergodicity. Finally, we illustrate our results using 
two examples in Section |4| 



2 Background 

2.1 The Gibbs Sampler 

Consider the two-component GS Markov chain 3> := {(X (0 \Y (0 ^, (X^\Y^), (X^ 2 \Y^), . . .}. In gen- 
eral, <E> evolves by drawing X and Y updates from the full conditional densities Tr(x\y) and ir(y\x), 
respectively. However, the order and frequency of component-wise updates depends on the chosen scan- 
ning strategy. Three fundamental strategies are composition (CGS), random sequence scan (RQGS), and 
random scan (RSGS). 

First, in every iteration of the CGS, X and Y are updated in a fixed, predetermined order. Without 
loss of generality, we assume throughout that X is updated first. Thus the CGS Markov kernel PcGS 
admits Markov transition density (Mtd) 

kcGs({x,y),(x',y')) = n(x'\y)ir(y'\x') . 

Specifically, Pcgs((x, y), A) = JJ kcGs((x,y), (x' ,y'))fj, x (dx')fi y (dy'). RQGS also updates both X and 

A 

Y in each iteration. However, the update order is randomly selected according to sequence selection 
probability q G (0, 1). Letting q be the probability of updating X first and 1 — q the probability of Y first, 
the RQGS Markov kernel Prqgs^ admits Mtd 

kRQGS,q((x, y), (x', y')) = qir{x'\y)ir(y\x) + (1 - q)ir(y'\x)7r(x'\y') . 

Thus the RQGS is essentially a mixture of the two possible composition scan GS (that which first updates 
X and that which first updates Y). Moreover, when q is close to 1, the RQGS behaves much like the 
CGS which first updates X. 

Finally, unlike CGS and RQGS, RSGS randomly selects a single component for update in each iteration 
while fixing the other. Letting component selection probability p G (0, 1) be the probability of updating 
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X and 1 — p the probability of updating Y, the RSGS Markov kernel Prsgs,p admits Mtd 

kRSGs,p(( x ' y)> ( x '> y')) = p 71 "^' - y) 

+ (1 — p)i:{y\x)8{x l — x) 

where 5 is Dirac's delta. Thus for p close to 1, the RSGS will produce many X updates but just as many 
repeat copies of Y. The opposite is true for p close to 0. 

Though CGS may be more familiar to readers, there are certain advantages to considering RSGS and 
RQGS. For example, it is easy to show that RSGS is reversible with respect to ir for all p and RQGS is 



reversible for q = 1/2. This, among other advantages, weakens the conditions for a CLT (Jones, 2004). 



Specifically, ^ holds if g has a finite second moment. In the reversible setting, we can also compare and 



measure the quality of the GS through Peskun ordering and variance bounding properties (Roberts and 



Rosenthal, 2008). 



2.2 Establishing Geometric Ergodicity 

Studying convergence properties of the GS requires a few definitions. First, let P denote a generic GS 
Markov kernel (CGS, RQGS, or RSGS) with Mtd k. We say a drift condition holds if there exist some 
drift function V : X X Y — > [1, oo), drift rate < A < 1, and constant b < oo such that 



PV(x, y) < XV(x, y) + b for all (x, y) G X x Y 

where, here applied to a function, P acts as an operator with 

PV(x,y) := E \v ^ t+1 \Y^ | Y«) = (x,y) 

V(x',y')k((x,y), (V \y'))n x (dx')n y {dy') . 



(3) 



(4) 



XxY 



We say V is unbounded off compact sets if the set Dd := {(x,y) : V(x,y) < d} is compact for all d > 0. 
Together, if ^ holds for some V that is unbounded off compact sets, $ will "drift" toward values of (x, y) 



for which V(x, y) is small (ie. close to 1). (See | Jones and Hobert (2001) for an in-depth discussion.) The 



rate of this drift is captured by A; the smaller the A, the quicker the drift. Thus, smaller A are loosely 
indicative of quicker convergence. In fact, Markov chain drift is a sufficient condition for geometric 
ergodicity. The following proposition follows from Lemma 15.2.8 and Theorems 6.0.1 and 15.0.1 of |Meyn 



and Tweedie (1993) 



5 



Proposition 1. Suppose the support of it has non-empty interior and Markov chain <3? is Harris ergodic 
and Feller, that is, for any open set O £ B 

liminf P((x n ,y n ),0) > P((x,y),0) for (x n , y n ), (x, y) £ X x Y . 

Then, if drift condition ^ holds for some V that is unbounded off compact sets, is geometrically 
ergodic. 



3 Geometric Ergodicity of the Gibbs Sampler 

Our main goal is to explore geometric ergodicity within and between the CGS, RQGS, and RSGS. To this 



end, we investigate the impact of GS scanning strategy on achieving geometric convergence in Section 3.1 



and provide sufficient and verifiable conditions for the geometric ergodicity of the GS in Section 3.2 



3.1 Geometric Ergodicity Under Different Scanning Strategies 

GS convergence rates depend on both target distribution w and scanning strategy. Though different 
scanning strategies can produce chains with differing asymptotic behaviors, the common building blocks of 
the CGS, RQGS, and RSGS (namely, the full conditional distributions used for component-wise updates) 
suggest there should also be links among their convergence properties. In this section we address three 
main questions: (Ql) Does geometric ergodicity of any one of CGS, RQGS, or RSGS guarantee the 
same for the others?; (Q2) Does geometric ergodicity of the RQGS using sequence selection probability q 
guarantee the same for all other selection probabilities?; and (Q3) Does geometric ergodicity of the RSGS 
using component selection probability p guarantee the same for all other selection probabilities? Under 
a set of conditions on target density ir and the GS, the answer to all three of these questions is YES. We 
call this set of conditions Assumption A which is satisfied if 

(a) the CGS, RQGS, and RSGS are Harris ergodic; 

(b) the support of tt has non-empty interior with respect to fi x x jx y \ and 

(c) for all (x,y), (x n ,y n ) £ X x Y 
lim inf x n I < lim inf 7r (y | x n ) and 7r ( x lim inf y n I < lim inf it (x \ y n ) . (5) 



tt \y 
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We believe that Assumption A does not significantly restrict the usefulness of our results. First, (a) 
and (b) are required by Proposition [I] where (a) is also a standard assumption for any Markov chain. 
Further, (b) and (c) are satisfied by a wealth of target densities on general state spaces explored by GS 
in practice. For example, (b) should hold in most cases where the maximal irreducibility measure for <3? 
is Lebesgue and (c) holds when it is continuous. In fact, (c) is simply a sufficient condition for the GS to 
be Feller. A proof of this Lemma is given in the appendix. 

Lemma 1. If §5§ holds for all (x,y), (x n ,y n ) G X x Y, then the CGS, RQGS, and RSGS are Feller. 



With these assumptions in place, the following theorem from Johnson et al (2013) will be critical in 
addressing (Ql), (Q2), and (Q3). 

Theorem 1. Under Assumption A, if CGS is geometrically ergodic, so are the RQGS for all sequence 
selection probabilities q and the RSGS for all component selection probabilities p. 

This result captures a clear connection between the convergence behavior of the CGS, RQGS, and 
RSGS. However, it fails to address (Q2) and (Q3). It also only provides an incomplete look into (Ql). 
Specifically, Theorem [T] proves that geometric ergodicity of RQGS and RSGS follow from that of the CGS, 
but not the converse. We fill in these gaps below, starting with an exploration of the RQGS. All proofs 
can be found in the appendix. 

Theorem 2. Under Assumption A, if RQGS is geometrically ergodic for some sequence selection proba- 
bility q E (0, 1), then so is the CGS. 

Corollary [T] follows directly from Theorems [T] and [2] 

Corollary 1. Under Assumption A, if RQGS is geometrically ergodic for some sequence selection prob- 
ability q E (0, 1), it is geometrically ergodic for all q G (0, 1). 

The results of Theorem [2] and Corollary [I] are, perhaps, intuitive. It is well known that the two- 
component CGS updating X then Y has the same convergence rate as that updating Y then X. Thus, 
if some mixture of these samplers (ie. RQGS) is geometrically ergodic, so too should be the individual 
components. Further, these results confirm that if some mixture of the geometrically ergodic CGS is 
geometrically ergodic, then all possible mixtures are geometrically ergodic. Next, we establish similar 
results for the RSGS. 



7 



Theorem 3. Under Assumption A, if RSGS is geometrically ergodic for some component selection prob- 
ability p, then so is the CGS. 



Corollary [2] follows directly from Theorems [T] and [3j 

Corollary 2. Under Assumption A, if RSGS is geometrically ergodic for some component selection 
probability p G (0, 1), it is geometrically ergodic for all p £ (0, 1). 

Consider Theorem [3| It is natural to believe that if the RSGS converges at a geometric rate by 
updating a single component in each iteration, so too should the CGS which updates both components 
in each iteration. The result of Corollary [2j on the other hand, might be more surprising. In its extreme, 
this corollary asserts that if a RSGS updating X with high frequency (p ~ 1) is geometrically ergodic, 
so is the RSGS updating X with low frequency (p ~ 0). In other words, if a chain converges quickly by 
spending the majority of its effort exploring one component of the state space while getting stuck in the 
other, so too will it converge quickly by spending its effort exploring the other component of the state 
space. 

Finally, combining the above theorems establishes Theorem [4j our main result. 

Theorem 4. Under Assumption A, suppose any one of the CGS, RQGS, or RSGS are geometrically 
ergodic. Then so are the others, regardless of RQGS and RSGS selection probabilities q andp, respectively. 

It is important to note that Theorem [4] does not assert that the CGS, RQGS, and RSGS converge 
at the same rate. In fact, if these samplers satisfy ([!]) for different t and M(-), their exact convergence 
rates, though all geometric, may significantly differ. The same is true within the RQGS and RSGS under 
different selection probabilities q and p, respectively. Thus choice of scanning strategy and choice of q and 
p within RQGS and RSGS may impact the empirical performance of a finite GS simulation. Assuredly, 
whether the geometric convergence is relatively fast or slow, the existence of a Markov chain CLT ^ 
provides a means for rigorously assessing the quality of MCMC inference. Though not the focus of this 
paper, we explore the impact of scanning strategy on finite simulation quality with a short study in 



Section |4.1| For a more in-depth discussion of the impact of p in RSGS, please see Levine and Casella 



(2006); Levine, Yu, Hanley, and Nitao (2005); Liu et al (1995) and see Johnson et al (2013) for further 



discussion of comparisons between CGS, RSGS, and RQGS. 
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3.2 Sufficient Conditions for Geometric Ergodicity 

We end this section with a simple set of sufficient conditions for the geometric ergodicity of the GS. 
By no means are these conditions exhaustive. Our goal is to merely provide guidance for those new to 
establishing geometric ergodicity. A proof of Theorem [5] is provided in the appendix. We recommend 
inspection of this proof to develop intuition for establishing geometric ergodicity. 

Theorem 5. Suppose Assumption A holds and that there exist functions f : X — > [l,oo) and g : Y — > 
[1, oo) and constants j, k, m, n > such that jm < 1 and 

E[f(x)\y]<jg(y) + k 

(6) 

E[g(y)\x] < mf(x) + n . 

Then if := {y : g(y) < d} is compact for all d > 0, the CGS, RQGS, and RSGS are geometrically 
ergodic. 

Lemma [2] follows directly from the proof of Theorem [5] 

Lemma 2. Under the assumptions of Theorem^ CGS, RQGS, and RSGS drift conditions ^ can be 
constructed as follows. For CGS, 

PcGsVcGs(x,y) < XcGsVcGs(x,y) + b C cs 

holds for VcGs(x,y) = g(y), jm < Xcgs < 1; and bcGS = mk + n. For RQGS with sequence selection 
probability q, define 

(2q — l)jm + \J jm(jm + 4g(l — q)(l — jm)) 
v RQGS,q = 2(1 - q)m • 

Then 

PRQGS,qVRQGs{x,y) < XRQGsVRQGs(x,y) + b RQG S 
holds for V RQGS (x, y) = f(x) + v RQGS , q g{y), 

bRQGS = l[k + v R QGS,g(mk + n)) + (1 - q) [v RQGS , q n + (jn + k)} , and 
(1 - q)U + VRQGS,q)m = -(jm + \fjm\jm + 4g(l - q)(l - jm)}) < Xrqgs < 1 • 
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Finally, for RSGS with component selection probability p, define 

(2p — l)jm + yj\ — 4p(i — p)(l — jm) 

VRSGS,p = wr, \ • 

2(1 — p)m 

Then 

Prsgs,pVrsgs(x,v) < XrsgsVrsgs(x,u) + b RS GS 
holds for V RS gs{x, y) = f(x) + v RSG s, p g{y), b RS cs =pk + (l- p)v RS GS, P n, and 

(1 -p)(l + v RS GS, p m) = -(1 + v 7 ! - ~ 3™)) < X R sgs < 1 • 

In constructing the functions / and g required by Theorem [5j keep in mind the following guidelines. 
First, the conditional expectations of / and g must maintain a cyclic- type relationship ([6]). Functions 
satisfying this requirement can often be found by exploring lower moments of the conditional distributions 
of X\Y and Y\X. Next, Lemma [2] demonstrates that CGS, RQGS, and RSGS drift functions can each be 
constructed as linear combinations of / and g (further evidence of systematic connections between their 
convergence behaviors). Recall that the Markov chain will drift toward values for which the drift function 
is small. Thus attention should be focused on functions / and g that take on small values in the center 
of the state space where density ir is largest. 

These concerns regarding / and g are specific to Theorem [5] which presents a single, but not exhaustive, 
set of sufficient conditions for geometric ergodicity. In turn, the drift conditions and drift rates provided 
by Lemma [2] are not unique. However, as smaller drift rates are loosely indicative of faster convergence, 
^CGSi ^RQGS, an d ^rsgs provide interesting insight into the convergence relationships between and 
within the CGS, RQGS, and RSGS. To this end, first notice the dependence of RQGS drift rate X R qgs 
on q. Mainly, X R qgs increases as q approaches 1/2 and converges to its lower bound, Xcgs = 3 m , as 
q approaches or 1. This suggests that the RQGS drift is quickest when one of the update orders is 
strongly favored over the other, that is, when RQGS behaves like CGS. Similarly, \ R sgs is minimized 
(hence drift is quickest) when p = 1 — p = 1/2, that is, when updates of X and Y are roughly balanced. 
It is in this setting that the RSGS behaves most like CGS. Finally, we can compare the CGS, RQGS, 
and RSGS drift rates. Indeed, since the RSGS requires at least two iterations to update both X and Y 
whereas the CGS and RQGS require only one, a more fair comparison might be among Xcgs, X R qgs, 
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and A|j 5G5 , the drift rate corresponding to the two-step RSGS drift condition: 

P RSGS,p V RSGs(x,y) = PRSGS,p( P RSGS,pVRSGs(x,y)) 

< Prsgs, p (^rsgsVrsgs(x,u) + ^rsgs) 

< \ rsgs VrsGs{x, y) + bRSGs(l + ^rsgs) ■ 

Given the definitions in Lemma [2J it follows that Xcgs < ^RQGS < ^rsgs < ^RSGS- Though this seems 
to suggest that the CGS converges quicker than the RQGS which converges quicker than the RSGS (both 
the original and two-step versions) , we again caution against placing too much importance on interpreting 
this single set of possible A. 



4 Examples 



We illustrate our results using two examples. The first is a toy example of GS for a Normal-Normal 
model. Included is a simulation study which explores the impact of scanning strategy on the empirical 
quality of finite GS for this model. The second considers GS for a special case of the Bayesian general 



linear model studied by Johnson and Jones (2010). This model is practically relevant in that inference 



for the corresponding Bayesian posterior distribution requires MCMC. 



4.1 A Normal-Normal Model 

Let X = (Xi,X2, ■ ■ ■ ,Xn) £ be an independent, identically distributed sample such that Xi\Y ~ 
N(Y,6 2 ) for each i and 7el follows a N(0,t 2 ) distribution. Thus, the joint distribution of (X,Y) is 
multivariate Normal with 

(;)— (CM "-;.r ■*;)) 

where ijy is the iV-dimensional identity matrix and tv and 1 at are iV-dimensional vectors of zeroes and 
ones, respectively. 

Inference for the Normal-Normal model does not require MCMC. However, this model provides a nice 
setting in which to illustrate our results for two-component GS. Let $ = {(X^,Y^)}°^ Q be the GS 
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chain which evolves by drawing from the conditional distributions 



X\Y ~ N(Yl N ,6 2 l 



N) 



I T 2 - f) 2 T 2 



Nt 2 + 9 2 ^ l ' Nt 2 + 9 2 



with first and second moments 



E(Xi\Y) = Y 
E(Xf\Y) = 8 2 + Y 2 

2 N 

E(Y\X) = 1 = V X t 

v 1 ; nt 2 + e 2 ^ 

^>=^+(^W) 2 (f>) 2 - 

The GS and Normal-Normal density clearly meet the conditions of Assumption A: the GS is Harris 
ergodic, the support of the Normal-Normal density is M Ar+1 which has non-empty interior with respect to 
Lebesgue measure, and the density is continuous hence satisfies condition (c). Thus to establish geometric 
ergodicity for the CGS, RQGS, and RSGS we need only find functions / and g that satisfy the conditions 
of Theorem [5] Per the discussion following Lemma [2j this choice can be guided by the lower moments 
of the full conditionals. Further, / and g should be small for values near the center of the state space. 
To this end, we know that the Normal conditional distributions of X\Y and Y\X have areas of higher 
density near the values of Oat and 0, respectively. With these guidelines in mind, consider defining 

f(X) = \ J2 + 1 and 9(Y) = Y 2 + 1 

where 1 is added to / and g to ensure /, g > 1 . Note that these satisfy the requirement that / and 
g be small for values near Otv and 0, respectively. Further, from the above conditional moments, it is 
straightforward to show that / and g satisfy ([6]) with j = N 2 , k = NO 2 + 1, 

/ t 2 \ 2 , e 2 T 2 

\Nt 2 + 6 2 J Nt 2 + 2 

Finally, Cd := {y : g(y) < d} = [—Vd — 1, y/d — 1], hence is compact for all d > 0. It follows from 
Theorem [5] that the CGS, RQGS, and RSGS for the Normal-Normal model are geometrically ergodic. 
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Though the CGS, RQGS, and RSGS are each geometrically ergodic, their exact convergence rates may 
differ. We explore these discrepancies and their impact on finite sample empirical performance by com- 
paring the CGS, RQGS for q £ {0.10, 0.25, 0.50, 0.75, 0.90}, and RSGS for p G {0.10, 0.25, 0.50, 0.75, 0.90} 
within two different parameter settings: 



Setting 



N 



I 2 r 2 



Var(Xi) Var(Y) CorpQ,X,) Cor(X,,Y) 



10 1 1 
10 1 0.1 



2 

1.1 



1 

0.1 



0.5 
0.091 



0.707 
0.302 



where the variance and correlation coefficients follow from 0. Before presenting our results, we remind the 
reader that GS convergence and performance depend both on scanning strategy and target distribution. 
Thus the comparisons we make between the GS below should not be generalized far beyond the specific 
Normal-Normal settings studied here. 

To begin, consider one long run of each GS in both settings. Starting from (X(°\Y(°>} = On, we 
independently ran the CGS and RQGS for 10 5 iterations and RSGS for 2 * 10 5 iterations since, again, 
RSGS requires at least twice as many iterations as the CGS and RQGS to obtain the same number of 
X and Y updates. Trace plots of the final 1000 Y iterations for selected GS in Setting (1) are included 
in Figure [TJ The trace behavior is similar for the CGS and RQGS under the extreme settings of q = 0.1 
and q = 0.9. On the other hand, as expected, the RSGS Y sub-chain appears to mix more slowly than 
for CGS and RQGS both when p = 0.1 (Y is updated frequently) and, even worse, when p = 0.9 (Y is 
updated infrequently). 

More formally, we can compare GS efficiency relative to the estimation of E(Y) = 0. Since E(Y A ) < 
oo, geometric ergodicity guarantees the existence of a Markov chain CLT for the Monte Carlo average 

Vn(Y -E(Y))^N(0,a Y ) , 

along with a consistent estimator of ay, Oy, via batch means methods. Thus an asymptotically valid 95% 
confidence interval (CI) for E{Y) can be calculated by 



Y ± 1.960 



where n* denotes the MCMC simulation length (n* = 10 5 for CGS and RQGS and n* = 2*10 5 for RSGS). 
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Further, the integrated autocorrelation time 

4 4/^* 

ACT = = 

Var(Y) Var(Y)/n* 

can be consistently estimated by ACT = ay/Var(Y) and provides a measure of the GS efficiency relative 
to that of a random sample from the Normal-Normal model. Specifically, the ACT indicates the number 
of GS iterations required for each random sample draw in order to achieve the same level of precision in 
estimating E(Y). 

The 95% confidence intervals and ACT's for each GS in Settings (1) and (2) can be found in Tables 
[T] and [2j respectively. Across GS scanning strategy, the CI half-widths and ACT's are larger in Setting 
(1) than in Setting (2). This is to be expected since the variance of Y and its correlation with X are 
larger in Setting (1). Further, comparisons of the GS empirical performances are similar within both 
settings and, interestingly, reflect the drift rate comparison discussion following Lemma[2j First, consider 
the comparisons between CGS, RQGS, and RSGS. Nearly without exception, the CI half-widths and 
ACT's are substantially larger for the RSGS than the RQGS which are slightly larger than, but roughly 
comparable to, those for the CGS. These patterns suggest that, relative to the estimation of E(Y), CGS 
has a slight edge over RQGS and both are substantially more efficient than RSGS. Next, consider the 
impact of selection probabilities q and p on the efficiencies of RQGS and RSGS, respectively. Within 
the RQGS, the CI half-widths and ACT's tend to decrease at a similar rate as q nears or 1. In other 
words, the RQGS is more efficient when either one of the update orders is heavily favored over the other 
(ie. when it behaves most like CGS). On the other hand, RSGS efficiency appears to improve as p nears 
0.5, that is, when X and Y are updated at a roughly similar rate. It is also interesting to note that, in 
both Settings (1) and (2), RSGS performs relatively better when p is small (ie. Y is updated frequently) 
than when p is large (ie. Y is updated infrequently). Thus in this specific Normal-Normal setting, there 
does not seem to be an advantage to increasing the frequency of Y updates as Var(Y) (and Cor(Aj, Y)) 
increases. 

Finally, we compare the quality of the Monte Carlo averages Y in estimating E(Y) using mean squared 
error 

MSE(Y) = E(Y - E(Y)f = E(Y) 2 . 
To estimate the MSEs, for each GS within each parameter setting, we performed 1000 independent runs 
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of either 10 4 iterations each (CGS and RQGS) or 2*10 4 iterations each (RSGS) and recorded the resulting 
independent estimates |y^, Y^ 2 , . . . , y*- 1000 ) |. From these, we estimate MSE by 

1000 9 

i=i 

The results are reported in Tables [T] and [2] as MSE ratios relative to CGS 

MSErqgsAY) and MSErsgs, p (Y) 
MSEcgs(Y) ^ MSE C Gs(Y) 

where MSE CG5 (Y) equals 0.00197 in Setting (1) and 0.0000295 in Setting (2). Examination of the MSE 
ratios produces conclusions compatible with those from the CI half-widths and ACT's. Mainly, CGS 
edges out RQGS (ie. all ratios are greater than 1) and both are substantially more efficient than RSGS. 
Further, RQGS is most efficient under q values near or 1 and RSGS is most efficient when p = 0.5. 



4.2 A Bayesian General Linear Model 



Johnson and Jones (2010) establish geometric ergodicity for the CGS for a popular Bayesian general 
linear model. Thus by Theorem |4j the RQGS and RSGS are also geometrically ergodic. An inspection 
of their proofs shows that the authors establish these results using the same techniques as those outlined 
by Theorem [5j For ease of exposition, we illustrate this approach for a (very) special case of this model, 
a Bayesian balanced random intercept model for K subjects with M observations on each. Specifically, 
let Y denote an N x 1 response vector, f3 a p x 1 vector of regression coefficients, and u a K x 1 vector. 
Further, let X be an N x p design matrix of full column rank and Z = Ik ® 1m where ® denotes the 
Kronecker product and 1m is an M x 1 vector of ones. Then the model is 

Y|/3, u, X R , X d ~ N N (X(3 + Zu, X^I N ) 
/3|n,A R ,A D ~N p (0,/ p ) 

n|A R ,A D ~N^(0,A^) (8) 
Xr ~ Gamma (2, 1) 
Xr> ~ Gamma (2, 1) 

where we say W ~ Gamma(2, 1) if it has density proportional to we~ w for w > 0. We also assume that 
j3 and u are conditionally independent given Xr, Xd, and y (ie. X T Z = 0). 
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We can explore the posterior distribution of /3, u, Xr, and Xp given data y using a two-component 
GS with components £ = (n T ,/3 T ) T and A = (Xr, Xd) T . Constructing the corresponding Markov chain 
$ = {(A^,£^)}°^ requires draws from the following full conditional distributions. Letting vi(^) = 
(y-XP- Zu) T {y -X(3- Zu) and v 2 (£) = u T u, 

A|£, y ~ Gamma ^2 + y, 1 + ^i(o) • Gamma ( 2 + y , 1 + ^ 2 (£) 

That is, the conditional distribution of A = (A#, Ad) given (£, y) is the product of two independent Gamma 
distributions. Further, 

£\X,y~N K+p (//.IT 1 ) 



where 



(A R X r X + V 1 



(j, = XrE 1 



(9) 



Accordingly, the GS and posterior density satisfy Assumption A and the following Lemma establishes the 
sufficient conditions required by Theorem [5} Geometric ergodicity of the CGS, RQGS, and RSGS follows. 



Please see Johnson and Jones (2010) for a proof of the Lemma. 



Lemma 3. Define 



./'(A) — K[-L + ^-)+r^^ i - + 1 



<7(0 = vi(0 + «2(0 + l. 
These functions satisfy ^ with j = Kj (2 + if), m = 1, and 

fc = ^ + 2K + 2^ 2+2 + 2 7V / 2+2 
iV + 2 Jf + 2 



N 

n 



Xixf + y T (l N + -^ZZ T \ y 
i=l V / 



where x% denotes the ith row of X . Further, d := {£ : #(£) < d} is compact for all d > 0. 

Under geometric ergodicity, inference for the posterior can be guided by the existence of a CLT and 
consistent estimates of Monte Carlo standard errors. For details, examples, and further study of the 



convergence among the GS for this model, please see Johnson and Jones (2010) and Johnson et al (2013) 
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5 Appendix 
5.1 Preliminaries 

The following lemmas are applied extensively throughout the appendix. The first provides the notation 
and structure required for constructing CGS, RQGS, and RSGS drift conditions. 

Lemma 4. Denote expectation with respect to the full conditional distributions as 

E[f(x,y)\x] := J f(x,y)n(y\x)ny(dy) 

E [f(x,y)\y] ■= J f(x,y)ir(x\y)n x (dx) 

Then for any function f(x,y), 

P C Gsf(x,y)=E[E [f(x',y')\x'] | y] 



PRQGS, q f(x, y) = qE [E [f(x', y') \ x'] \ y] + (1 - q)E [E [f(x', y') \ y'] 
PRSGS, P f(x, y) = pE [f{x', y)\y]+(l- p)E [f(x, y') \ x] . 

Proof. First, it follows from Q that 

PcGsf(x,y) = J J f(x',y')Tr(x'\y)7r(y'\x')fi x (dx')ny(dy') 

n(x'\y)fi x (dx') 



x 



f(x',y')TT(y'\x')^ y (dy') 

= E[E [f(x',y')\x'] | y] 
and the RQGS proof is similar. Finally, 

PRSGS, P f(x,y) =P f{x\y')^{x'\y)5{y - y)^ x {dx')^ y {dy') 



+ (1 - P) J J f(x', y')ir{y'\x)8{x' - x)n x (dx')fi y {dy') 
P J f(x',y)n(x'\y)iJ, x (dx') + (l-p) J f(x,y')ir(y'\x)n y (dy') 
pE[f(x',y)\y] +(l-p)E[f(x,y')\x] . 



□ 



We will use the following Lemma to establish that our drift functions are unbounded off compact sets. 
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Lemma 5. Suppose Assumption A holds and function V : X x Y — >■ [1, oo) is unbounded off compact sets. 
Then V\ : X X Y — >■ [1, oo) and Vi : X X Y — >■ [1, oo) are also unbounded off compact sets where 

V^x, y) = uE[V(x, y')\x] + vE[V(x', y)\y] 

V 2 (x, y) = uE[E[V(x', y')\y']\x] + vE[E[V(x', y')\x']\y] + wE[V(x', y)\y] 
for u,v,w > such that u + v + w > 0. 

Proof. Let Cd ■= {(x,y) : V(x,y) < d} where, by assumption, Cd is compact for all d > 0. To establish 
that V\ is unbounded off compact sets, we will prove that the Dd ■= {(x, y) : V±(x, y) < d} is also compact 
(ie. closed and bounded) for all d > 0. 

First, we show Dd is closed. Specifically, we show that if {{xi,yi)} c *l 1 C Dd and lim„^ 00 (x n , y n ) = 
(x,y), then (x,y) is also in Dd (ie. V\(x,y) < d). To this end, notice that for all (x,y) G X x Y 

V(x,y) < liminf V(x n ,y) and V(x, y) < liminf V(x, y n ) 

by the closedness of Cd and, by Assumption A, 

K(y\x) < liminf ir(y\x n ) and ^(x\y) < liminf ir(x\y n ) . 

n— >oo n— >oo 

Thus 

Vi(x, y) = y)\x] + vE[V(x, y)\y] 

= u j V(x,y)n{y\x)vLy{dy) + v j V(x,y)Tr(x\y)fi x (dx) 

<u liminf V(x n ,y) liminf ii{y\x n )ii v (dy) + v / liminf V(x, y n ) liminf Tr(x\y n )fi x (dx) 

J n—>co n— >oo J n— >oo rt->oo 

<u / liminf V(x n , y)n{y\x n )n y (dy) + u / liminf V(x, y n )ir(x\y n )fi x (dx) 

J n— >oo J n— >oo 



< liminf 

n— >co 



V(x n ,y)ir(y\x n )[j,y(dy) + v J V(x,y n )Tr(x\y n )fi x (dx) 



= liminf Vi(x n ,y n ) 

n— >co 

< d 

where the penultimate inequality follows from Fatou's lemma and the final inequality is guaranteed by 
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Next, we show is bounded. To this end, consider two cases. First, suppose V(x, y) attains a finite 
maximum value m = max( :c>2/ ) g xxY'{X0 c >y)} < 00 • I n this case, X x Y = {(x,y) : V(x,y) < m} = C m 
which is bounded by assumption. Since C X x Y for all d, must also be bounded. On the other 
hand, suppose V(x, y) does not attain a finite maximum. In this case, define 

A {x, y ) = {x ■ V(x, y) > V(x, y)} and = {y : V(x, y) > V(x, y)} 

and notice that for any (x, y) £ X x Y, either At x ^ has positive measure with respect to fj, x (dx) or B( x ^ 
has positive measure with respect to fi y (dy). Thus, for all (x,y) 



V 1 (x,y) = u / V(x,y)n(y\x)fiy(dy)+v / V(x,y)ir(x\y)n x (dx) 



>u I V(x,y)n(y\x)fiy(dy) + v V(x,y)n(x\y)fi x (dx) 



> V(x,y) 

> cV(x,y) 



u ir(y\x)(j, y (dy) +v / ir(x\y)fjL x {dx) 



where c := m.m.t xv \ \ u f R ir(y\x)u v (dy) + v f, ir(x\y)^ x (dx) \ > 0. It follows that for all (x, y) € 
Drf, is also in C d / C since 

V(s,y)<*^><*. 

c c 

Thus, -D^ C C^/ c and the boundedness of follows from the boundedness of C^/ c . The proof that V2 is 
unbounded off compact sets is similar, thus eliminated here. 

□ 

5.2 Proof of Lemma [I] 

We prove here that CGS is Feller. The proofs for RQGS and RSGS are similar, thus eliminated. First, 
^ guarantees that for any (x,y), (x n ,y n ), (x' n ,y' n ) G X x Y 

liminf kcGs((x n ,y n ), (x' n ,y' n )) = liminf 7r(x' n \y n )7T(y' n \x' n ) 

> ir(x' n \y)Tr(y 

n\ x nl 

= k CG s{(x,y),(x' n ,y' n )) . 
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Thus the CGS is Feller since for any open set O G B, an application of Fatou's Lemma shows that 
liminf PcGs((x n , y n ), O) = liminf kcGs((x n ,y n ), {x' n ,y' n ))n x {dx' n )n y (dy' n ) 

(x n ,y„)^(x,y) (x„,y n )^(x,y) J J 

O 

> I liminf k C Gs((x n , y n ), (x' n , y' n ))ii x (dx' n )fj, y (dy' n ) 

J J (xn,y n )^(x,y) 
O 



> J J k C Gs{(x,y),(x' n ,y' n ))fj, x (dx' n )fi y (dy' n ) 
'6 

= P CG s((x,y),0) . 



5.3 Proof of Theorem 



Geometric ergodicity of RQGS with sequence selection probability q guarantees the existence of drift 
function V : X x Y — > [1, oo), A G (0, 1), and finite constant b > such that V is unbounded off compact 
sets and 

PRQGS, q V(x,y)=qE[E[V(x',y')\x'}\y} + (l-q)E[E[V(x',y')\y'}\x] < \V(x,y) + b (10) 
where the equality follows from Lemma |4j Without loss of generality, assume A > max{q, 1 — q} since if 



(10) holds for A < max{q, 1 — q}, it must also hold for A > max{g, 1 — q}. 
To establish geometric ergodicity for CGS, define 

g(x) = E[E[V(x',y')\y'}\x] 
h(y)=E[E[V(x',y')\x']\y] 
z(y) = E[V(x',y)\y] 

and constants v and w such that 

A(A-g) A(l-g) A(l-g) x(l-q)-(X-(l-q))qw 

7. < w < ^ and k w < v < . 

q z q z q z \q 

Also, define V(x, y) = vg(x) + h(y) + wz{y) > 1. By Lemma[5j V is unbounded off compact sets. Thus, 
geometric ergodicity of the CGS will follow from establishing the CGS drift condition 

P C GsV{x,y) < \V(x,y) + b 
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for b = b(v + w)/(l — q) and 

(q + X . q A fv + w 1\\ ~ 

max (v + w)- , — H<A<1. 

I q l-q w \l-q q) J 

To this end, first notice that the RQGS drift condition guarantees 

qh{y) + (1 - q)g{x) = P RQG s, q V{x, y) < \V(x, y) + b. 
In conjunction with Lemma [4j it follows that 

PcGsg(x) = E[E[g(x')\x'}\y] = E[g(x')\y] 

= Y^~q E ^ " q)9{X,) + qk{y) I ^ " l^q k{v) 

< J_ E[ xv(x', y) + b\y] - j^-h(y) 
1 — q 1 — q 



and 



l-q l-q l-q 



PccsKy) = E[E[h{y')\x' 



= -E [E [qh(y>) + (1 - q)g{x')\x'] \ y] - ^E [E [g{x')\x'] \ y] 
< - q E [E[XV(x',y') + b\x']\y] - ^E [E [g(x')\x'] \ y] 

= -h(y) - - — - p cGsg{x) + -b 

q q q 



P CG sz{y) = E[E[z(y')\x']\y] 

= E[E[E[V(x,y')\y'}\x'}\y] 

= E[g(x')\y] 

= E[E[g(x')\x'}\y] 

= PcGsg(x) ■ 
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Thus 



PcgsV(x, y) = vPcGsg(x) + PcgsKv) + wP CGSz{y) 

< (v + w- - — -J Pccsg{x) + -h{y) + -b 

V q l-qj w v i - q qj 1-9 

< \{h(y) + wz{y)) + b 

< \V{x,y) + b 

and the result holds. 



5.4 Proof of Theorem |3] 

Geometric ergodicity of RSGS with component selection probability p guarantees the existence of drift 
function V : X X Y — )■ [1, oo), A G (0, 1), and finite constant b > such that V is unbounded off compact 
sets and 

PRSGS, P V(x,y)=pE[V(x',y)\y] + (1 - p)E[V{x, y')\y) < XV(x,y) + b 

where the equality holds from Lemma [4j Without loss of generality, we assume A > max{p, 1 — p} (see 
the proof of Theorem [2]) . 

To establish geometric ergodicity for CGS, define 

g{x) = E[V(x, y')\x] and h(y) = E[V(x', y)\y] 

and constant 

p[X-p) 

Also, define V(x,y) = g(x) + vh{y) > 1. It follows from LemmaJHjthat V is unbounded off compact sets. 
We will also show that V satisfies the CGS drift condition 

PcGsV{x,y) < \V(x,y) + b 

for 

X-p | (A-p)(A + p-l) < - <1 



v(l — p) p(l — p) 
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and b = (Xv + p)/(p(l — p)). Geometric ergodicity of the CGS will follow. 
First, notice that the RSGS drift condition guarantees 

ph(y) + (1 - p)g{x) = Prsgs, p V{x, y) < XV (x, y) + b. 



Thus 



P C Gsg{x) = E[E[g(x')\x']\y] = E[g(x')\y] 



and 



PccsKy) = E[E[h{y')\x' 



1 


-p 




1 


1 


-p 


A 


-p 


1 


-p 



E[(l-p)g{x')+ph(y)\y\ 
771 rxT^/ / \ , t 1 ~i P 



p 



l-p 



Hy) 



p 



Hy) + 



l-p 



E 



E [ph(y') + (1 - p)g(x') 



1 — p 

P 



g(x') y 



< e 

X + p-l 



E[XV{x',y') + b\x']- l -^g{x') 
p p 



< 



p E[g(x')\y]+- 

[(1 - p)g(x') + ph(y) I y] - ^^%) 



(A-p)(A+p-l) 



%) + 



A 



p(l-p) P(l-p) 
Combining these results establishes the CGS drift condition: 



b . 



PccsV{x,y) = PcGsg(x) + vPcgsHv) 
< 



X-p (A-p)(A+p-l) 



v (1 — p) 



p(l -p) 



< 



X-p (X-p)(X+p-l) 



+ 



v(l-p) p(l-p) 
< XV(x,y)+b. 



l9(x) + vh(y)]+(^±^)b 
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5.5 Proof of Theorem [5] 

First, consider the CGS Markov chain $ := { Y^) , (X^,Y^) , (X^ 2 , y( 2 )) , . . .} and its y sub-chain 
$> y ■= |y(°), y( 2 ), . . .} with Mtd and Markov kernel 

kcGS,y(y,y') = J ^(x'\y)Tr(y'\x')fi x (dx') = J kcGs((x,y),(x',y'))n x (dx') 
PcGs, y (y> A ) = / k cGs,y{yiy')ny{dy) = PcGsi(x,y),X X A) . 



A 



Notice that for any g : Y — > M, 

PcGs, y g(y)= I g(y f )kcGs,v(y,t/)fJiy{dy') 



g(y')kcGs({x, y), (x, y'))/j, x (dx')fj, y (dy') 
= PcGsg{y) ■ 

It is also well known that, in this two-component setting, if & y is geometrically ergodic, so is <I> (Roberts 
and Rosenthal (2001)). Thus, it suffices to establish geometric ergodicity for & y . To this end, let Vcgs-, 
^CGS-, and bees be as defined in Lemma [2} Then the following drift condition holds for both the CGS 
and its y sub-chain: 

PcGS, y g(y) = PcGsg(y) = E[E[g{y')\x']\y] 

< E[mf(x') + n\y] 

< jmg{y) + {mk + n) 

= >^cGsg(y) + bcGs 

where g(y) = Vcgs(x, y)- Further, by the assumption that Cj, := {y : g(y) < d} is compact for all d > 0, 
g(y) is unbounded off compact sets for & y . Thus Q y (and <£) is geometrically ergodic. 

In conjunction with Theorem [4j geometric ergodicity of the CGS guarantees the same for RQGS and 
RSGS. Though unnecessary, it is still interesting to note that the conditions of Theorem [5] can be used to 
construct drift conditions for RQGS and RSGS. To this end, let Vrqgs-, ^rqgs, ^rqcs-, and v = vrqcs^ 
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be as defined in Lemma [2] Then the following RQGS drift condition holds: 

PRQGS, q V RQ Gs(x,y) = qE[E[V RQGS (x',y')\x'}\y} + (1 - q)E[E[V RQGS (x' , y')\y']\x] 
= qE[E[f(x') + vg(y')\x'}\y] + (1 - q)E[E[f(x') + vg(y')\y'}\x] 

< qE[(l + vm)f(x') + vn\y] + (1 - q)E[(j + v)g(y) + k\x] 

< (1 - q)(j + v)mf(x) H vg(y) + b RQGS 

= ^r.qgsVrqgs(x, y) + b R QGs 

where the final equality holds since v is a solution to (1 — q)(j + v)m = qj(l + vm)/v. Further, defining 
Vrsgs, <^RSGS, b R sGS, and v = v R sgs,p by Lemma [2] produces the following RSGS drift condition: 

PRSGS,pV R scs(x,y) = pE[V RS Gs(x',y)\y] + (1 - p)E[V RS Gs(x,y')\x] 

= P E[f(x') + vg(y)\y] + (1 - p)E[f(x) + vg(y')\x] 

p(j -\- v) 

< (1 — p)(l + vm)f(x) H ^5(y) + pk + (1 — p)vn 

v 

= ^rsgsVrsgs(x, y) + b R sGS 
where the final equality holds since v is a solution to (1 — p)(l + vm) = p(j + v )/v . 
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Figure 1: Trace plots of GS for Y in the Normal-Normal model of Section [4.1| Shown are the last 1000 
iterations of (a) 10 5 CGS iterations, (b) 10 5 RQGS iterations under q = 0.1 (dashed) and q = 0.9 (solid), 
and (c) 2 * 10 5 RSGS iterations under p = 0.1 (dashed) and p = 0.9 (solid). 
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Table 1: Summary of GS for the Normal-Normal model of Section 4.1 under Setting (1). The 95% CI's 
and ACT's are calculated from single independent runs of the GS. MSE ratios relative to the CGS MSE 
of 0.00197 are estimated from 1000 independent runs of each GS. Standard errors for the MSE ratios are 
in parentheses. 



Algorithm 


95% CI 


ACT 


MSE Ratio 


CGS 


0.0067 ± 0.0273 


19.289 


1 


RQGS g = 0.10 
q = 0.25 
q = 0.50 
q = 0.75 
q = 0.90 


0.0054 ± 0.0287 
0.0029 ± 0.0286 
0.0153 ± 0.0343 
0.0167 ± 0.0289 
-0.0188 ± 0.0274 


21.241 
21.171 
30.382 
21.554 
19.459 


1.217 (0.078) 
1.250 (0.081) 
1.365 (0.086) 
1.206 (0.078) 
1.158 (0.075) 


RSGS p = 0.10 
p = 0.25 
p = 0.50 
p = 0.75 
p = 0.90 


-0.0132 ± 0.0567 
0.0223 ± 0.0425 
0.0177 ± 0.0379 

-0.0485 ± 0.0440 
0.0033 ± 0.0589 


166.592 
93.622 
74.480 
100.246 
179.441 


5.701 (0.368) 
2.705 (0.178) 
2.178 (0.138) 
2.662 (0.171) 
6.279 (0.409) 
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Table 2: Summary of GS for the Normal-Normal model of Section 4.1 under Setting (2). The 95% CI's 
and ACT's are calculated from single independent runs of the GS. MSE ratios relative to the CGS MSE 
of 0.0000295 are estimated from 1000 independent runs of each GS. Standard errors for the MSE ratios 
are in parentheses. 



Algorithm 


95% CI 


ACT 


MSE Ratio 


CGS 


-0.0004 ± 0.0034 


2.996 


1 


RQGS g = 0.10 
q = 0.25 
q = 0.50 
q = 0.75 
q = 0.90 


0.0002 ± 0.0035 
0.0010 ± 0.0034 
-0.0027 ± 0.0039 
-0.0003 ± 0.0036 
-0.0024 ± 0.0034 


3.103 
3.068 
4.006 
3.328 
2.935 


1.060 (0.068) 
1.133 (0.070) 
1.264 (0.078) 
1.034 (0.063) 
1.050 (0.063) 


RSGS p = 0.10 
p = 0.25 
p = 0.50 
p = 0.75 
p = 0.90 


0.0029 ± 0.0063 
0.0024 ± 0.0047 
-0.0005 ± 0.0045 
-0.0030 ± 0.0059 
0.0004 ± 0.0087 


20.274 
11.668 
10.350 
17.983 
39.203 


3.831 (0.228) 
2.094 (0.131) 
2.063 (0.126) 
3.077 (0.184) 
7.022 (0.440) 
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